Goto

Collaborating Authors

 streaming pca


Streaming PCA for Markovian Data

Neural Information Processing Systems

Since its inception in 1982, Oja's algorithm has become an established method for streaming principle component analysis (PCA). We study the problem of streaming PCA, where the data-points are sampled from an irreducible, aperiodic, and reversible Markov chain starting in stationarity. Our goal is to estimate the top eigenvector of the unknown covariance matrix of the stationary distribution. This setting has implications in scenarios where data can solely be sampled from a Markov Chain Monte Carlo (MCMC) type algorithm, and the objective is to perform inference on parameters of the stationary distribution. Most convergence guarantees for Oja's algorithm in the literature assume that the data-points are sampled IID. For data streams with Markovian dependence, one typically downsamples the data to get a nearly independent data stream. In this paper, we obtain the first near-optimal rate for Oja's algorithm on the entire data, where we remove the logarithmic dependence on the sample size, $n$, resulting from throwing data away in downsampling strategies.


Memory Limited, Streaming PCA

Neural Information Processing Systems

We consider streaming, one-pass principal component analysis (PCA), in the high-dimensional regime, with limited memory. Here, $p$-dimensional samples are presented sequentially, and the goal is to produce the $k$-dimensional subspace that best approximates these points. Standard algorithms require $O(p^2)$ memory; meanwhile no algorithm can do better than $O(kp)$ memory, since this is what the output itself requires. Memory (or storage) complexity is most meaningful when understood in the context of computational and sample complexity. Sample complexity for high-dimensional PCA is typically studied in the setting of the {\em spiked covariance model}, where $p$-dimensional points are generated from a population covariance equal to the identity (white noise) plus a low-dimensional perturbation (the spike) which is the signal to be recovered. It is now well-understood that the spike can be recovered when the number of samples, $n$, scales proportionally with the dimension, $p$.


Streaming PCA for Markovian Data

Neural Information Processing Systems

Since its inception in 1982, Oja's algorithm has become an established method for streaming principle component analysis (PCA). We study the problem of streaming PCA, where the data-points are sampled from an irreducible, aperiodic, and reversible Markov chain starting in stationarity. Our goal is to estimate the top eigenvector of the unknown covariance matrix of the stationary distribution. This setting has implications in scenarios where data can solely be sampled from a Markov Chain Monte Carlo (MCMC) type algorithm, and the objective is to perform inference on parameters of the stationary distribution. Most convergence guarantees for Oja's algorithm in the literature assume that the data-points are sampled IID.


Efficient Convex Relaxations for Streaming PCA

Neural Information Processing Systems

We revisit two algorithms, matrix stochastic gradient (MSG) and \ell_2 -regularized MSG (RMSG), that are instances of stochastic gradient descent (SGD) on a convex relaxation to principal component analysis (PCA). These algorithms have been shown to outperform Oja's algorithm, empirically, in terms of the iteration complexity, and to have runtime comparable with Oja's. However, these findings are not supported by existing theoretical results. While the iteration complexity bound for \ell_2 -RMSG was recently shown to match that of Oja's algorithm, its theoretical efficiency was left as an open problem. In this work, we give improved bounds on per iteration cost of mini-batched variants of both MSG and \ell_2 -RMSG and arrive at an algorithm with total computational complexity matching that of Oja's algorithm.


Efficient Convex Relaxations for Streaming PCA

Arora, Raman, Marinov, Teodor Vanislavov

Neural Information Processing Systems

We revisit two algorithms, matrix stochastic gradient (MSG) and $\ell_2$-regularized MSG (RMSG), that are instances of stochastic gradient descent (SGD) on a convex relaxation to principal component analysis (PCA). These algorithms have been shown to outperform Oja's algorithm, empirically, in terms of the iteration complexity, and to have runtime comparable with Oja's. However, these findings are not supported by existing theoretical results. While the iteration complexity bound for $\ell_2$-RMSG was recently shown to match that of Oja's algorithm, its theoretical efficiency was left as an open problem. In this work, we give improved bounds on per iteration cost of mini-batched variants of both MSG and $\ell_2$-RMSG and arrive at an algorithm with total computational complexity matching that of Oja's algorithm.


Memory Limited, Streaming PCA

Mitliagkas, Ioannis, Caramanis, Constantine, Jain, Prateek

Neural Information Processing Systems

We consider streaming, one-pass principal component analysis (PCA), in the high-dimensional regime, with limited memory. Here, $p$-dimensional samples are presented sequentially, and the goal is to produce the $k$-dimensional subspace that best approximates these points. Standard algorithms require $O(p 2)$ memory; meanwhile no algorithm can do better than $O(kp)$ memory, since this is what the output itself requires. Memory (or storage) complexity is most meaningful when understood in the context of computational and sample complexity. Sample complexity for high-dimensional PCA is typically studied in the setting of the {\em spiked covariance model}, where $p$-dimensional points are generated from a population covariance equal to the identity (white noise) plus a low-dimensional perturbation (the spike) which is the signal to be recovered. It is now well-understood that the spike can be recovered when the number of samples, $n$, scales proportionally with the dimension, $p$.


AdaOja: Adaptive Learning Rates for Streaming PCA

Henriksen, Amelia, Ward, Rachel

arXiv.org Machine Learning

Oja's algorithm has been the cornerstone of streaming methods in Principal Component Analysis (PCA) since it was first proposed in 1982. However, Oja's algorithm does not have a standardized choice of learning rate (step size) that both performs well in practice and truly conforms to the online streaming setting. In this paper, we propose a new learning rate scheme for Oja's method called AdaOja. This new algorithm requires only a single pass over the data and does not depend on knowing properties of the data set a priori. AdaOja is a novel variation of the Adagrad algorithm to Oja's algorithm in the single eigenvector case and extended to the multiple eigenvector case. We demonstrate for dense synthetic data, sparse real-world data and dense real-world data that AdaOja outperforms common learning rate choices for Oja's method. We also show that AdaOja performs comparably to state-of-the-art algorithms (History PCA and Streaming Power Method) in the same streaming PCA setting.


History PCA: A New Algorithm for Streaming PCA

Yang, Puyudi, Hsieh, Cho-Jui, Wang, Jane-Ling

arXiv.org Machine Learning

In this paper we propose a new algorithm for streaming principal component analysis. With limited memory, small devices cannot store all the samples in the high-dimensional regime. Streaming principal component analysis aims to find the $k$-dimensional subspace which can explain the most variation of the $d$-dimensional data points that come into memory sequentially. In order to deal with large $d$ and large $N$ (number of samples), most streaming PCA algorithms update the current model using only the incoming sample and then dump the information right away to save memory. However the information contained in previously streamed data could be useful. Motivated by this idea, we develop a new streaming PCA algorithm called History PCA that achieves this goal. By using $O(Bd)$ memory with $B\approx 10$ being the block size, our algorithm converges much faster than existing streaming PCA algorithms. By changing the number of inner iterations, the memory usage can be further reduced to $O(d)$ while maintaining a comparable convergence speed. We provide theoretical guarantees for the convergence of our algorithm along with the rate of convergence. We also demonstrate on synthetic and real world data sets that our algorithm compares favorably with other state-of-the-art streaming PCA methods in terms of the convergence speed and performance.


Streaming PCA: Matching Matrix Bernstein and Near-Optimal Finite Sample Guarantees for Oja's Algorithm

Jain, Prateek, Jin, Chi, Kakade, Sham M., Netrapalli, Praneeth, Sidford, Aaron

arXiv.org Machine Learning

This work provides improved guarantees for streaming principle component analysis (PCA). Given $A_1, \ldots, A_n\in \mathbb{R}^{d\times d}$ sampled independently from distributions satisfying $\mathbb{E}[A_i] = \Sigma$ for $\Sigma \succeq \mathbf{0}$, this work provides an $O(d)$-space linear-time single-pass streaming algorithm for estimating the top eigenvector of $\Sigma$. The algorithm nearly matches (and in certain cases improves upon) the accuracy obtained by the standard batch method that computes top eigenvector of the empirical covariance $\frac{1}{n} \sum_{i \in [n]} A_i$ as analyzed by the matrix Bernstein inequality. Moreover, to achieve constant accuracy, our algorithm improves upon the best previous known sample complexities of streaming algorithms by either a multiplicative factor of $O(d)$ or $1/\mathrm{gap}$ where $\mathrm{gap}$ is the relative distance between the top two eigenvalues of $\Sigma$. These results are achieved through a novel analysis of the classic Oja's algorithm, one of the oldest and most popular algorithms for streaming PCA. In particular, this work shows that simply picking a random initial point $w_0$ and applying the update rule $w_{i + 1} = w_i + \eta_i A_i w_i$ suffices to accurately estimate the top eigenvector, with a suitable choice of $\eta_i$. We believe our result sheds light on how to efficiently perform streaming PCA both in theory and in practice and we hope that our analysis may serve as the basis for analyzing many variants and extensions of streaming PCA.